From the GDPR (Hoofnagle, Sloot and Borgesius, 2019) and its antecedents, a number of concepts have been established which are relevant to this thesis, specifically (Information Commissioner’s Office, 2014; The European Parliament and the Council of the European Union, 2016):
The terms Subject Access Request and Data portability are used in Case Study Two, and referenced also in Chapter 7.
For simplicity, this thesis uses everyday layperson-friendly terms rather than the legal terms defined in this section. Data subjects are referred to simply as individuals and both data controllers and data processors as data holders, because for this thesis, focusing as it does on the individual perspective, there is no need to draw a distinction between data controllers and data processors.
By removing the filter layer on an old monitor and modifying cinema IMAX glasses, a monitor was created that only allowed viewing by the holder of the viewing glasses, which would be ideal for interviewing someone about their data while respecting privacy. Face to face interviewing had to be abandoned due to COVID-19, so this technique was sadly never used in practice.
The table below illustrates the types of family civic data identified in the pilot study [3.4.1; Bowyer et al. (2018); Appendix A], and referenced in Case Study One [4.2.1].
| Category | Type of data | Examples/Details |
|---|---|---|
| Family | Personal details | Date of birth, address, telephone number. |
| Relationships | Marital status, exs, step-parents, living arrangements. | |
| Children | Parentage, adoption, fostering, childcare. | |
| Education | School Records | Attendance (truancy), special needs. |
| Academic Results | SATs, reports, exam failures, training courses. | |
| Welfare | Social Support | Social worker visits & notes, details of family crises, interventions, allegations. |
| Welfare Benefits | Jobseeker’s Allowance, child support, Disability Living Allowance, tax credits | |
| Money/Work | Family Finances | Salary, savings, credit cards, spending, debt |
| Employment | Job history, periods of unemployment, performance at work, NI, PAYE, pensions. | |
| Civil | Housing data | Council house provision, eligibility criteria. |
| Legal documents | Birth / marriage / death certificates, citizenship /immigration status, work permits. | |
| Crime | Criminal records | Arrests, cautions, offenders’ registers, prison time, speeding tickets, spent convictions. |
| Court orders | Restraining orders, lawsuits, custody, ASBOs. | |
| Domestic Violence | Allegations made, medical records, social / legal interventions, victim support. | |
| Medical | GP records | GP’s notes, prescriptions, tests, referrals. |
| Hospital records | Operations, hospital stays, emergency care. | |
| Medical conditions | Diagnoses, diseases, allergies, blood type. | |
| Mental health | PTSD, breakdowns, depression, sectioning. | |
| Addictions | Substance abuse, gambling, rehab, crime. | |
| Leisure1 | Library Usage | Books/CDs borrowed, computer access. |
| Sports & Health | Gym usage, class attendance. | |
| Shopping Habits | Loyalty cards, store & online purchases. | |
| Transport Data | Buses used, ANPR tracking, walking patterns. |
In this section, additional details are provided on the Sentence Ranking exercise referenced in 4.2.6.
The sentences offered to participants across the 4 workshops were as follows:
S1 A family’s data should all be joined up and looked
at together.S2 Any information from more than 5 years ago should be
hidden from staff.S3 Asking families for consent to share data just once
at the start is enough.S4 Councils should treat families like people, not
records in a database.S5 Families don’t want to be responsible for looking
after their data.S6 Families find setting privacy preferences to be
annoying and tedious.S7 Families should always be able to talk to someone
from the authorities about their data.S8 Families should have rights to see their data and
how it is used.S9 Families will be willing to spend time checking
their data is correct.S10 Families won’t mind lots of data being collected
about them if they can see it.S11 Families’ data should be private unless they say it
can be shared.S12 Information stored about families must be fair and
accurate.S13 It is important for support workers to know mental
health details.S14 Just looking at data doesn’t tell you everything
about a family.S15 Labels like ‘domestic abuse’ are damaging to
families & hard to shake off.S16 Numerical scores are a good way to compare the
progress families have made.S17 Officials should be able to see historical records
about families.S18 Public sector officials can make good judgements
just by looking at families’ data.S19 Support workers make better decisions if they have
more data about a family.S20 Support workers should be able to see family
medical records.S21 The police should be able to see all of a family’s
data.Where participants unanimously or mainly disagreed with a sentence,
it is referenced in the inverse using a prime notation,
e.g. S18', which would imply a reference to the opposite of
the statement - in this case ‘Public sector officials can
not make good judgements just by looking at families’
data.’
In each of the workshops, families ranked the sentences according to:
This produced numerical ranking data which was analysed as follows:
Sentence rankings were encoded on two scales. Sentences which contained a negative statement were inverted so that disagreement with them could be considered as agreement with a positive statement.
Rankings from different groups within workshops were aggregated, using mean averaging, with a weighting to ensure each workshop contributes equally regardless of attendance.
This gave four values for each sentence, for each participant group (families only, staff only, and combined). Variance can be understood as ‘unanimity of opinion’: i.e. variance 0.0 indicates total agreement and 1.0 would indicate disagreement.
Prioritising variance in agreement over variance of importance, the four dimensions were reduced to three to allow a visualisation to be produced.
The resulting visualisation is shown in Figure 4.1.
Drawing from the world of film production, storyboarding is a well-established technique in participatory design (Spinuzzi, 2005; Moraveji et al., 2007). Usually it involves the participants drawing out a series of sketches in the form of a comic strip ‘telling the story’ of an interaction, encounter or activity. However, it had already been determined, both in terms of the research approach of this thesis [3.2.2], and in terms of responding to participants [4.2.6] that it would be more important to understand the interpersonal interactions between family and support worker and the actual actions performed upon or with data, rather than the mechanisms by which the data interaction would occur. Focusing on the visual aspects of information visualisation could be distracting. Therefore, I developed a novel technique for use in the phase 2 workshop: Storyboarding Action Cards. Each storyboard card denotes a possible action that can be carried out by a family member (yellow border), support worker (blue border) or an action performed together (green border). Each card includes a simple action summary such as ‘Give Information’ and an iconographic representation of the action, along with a short description of which actor is doing what. It includes blank lines which the participant can ‘fill in’ to describe the specifics of this occurrence of the action.
Based on the accumulated knowledge of Early Help processes amongst myself and SILVER colleagues, enhanced for this purpose through consultation with a former social worker, I developed a total of 43 different cards to represent the suite of possible actions that would be interesting to track. These are grouped into eight different types of card:
The intent behind the storyboarding action cards is that they serve as both a boundary object and things to think with (as with the Family Data Cards described in (Bowyer et al., 2018)) to provoke discussion among participants. They have an additional function over the Family Data Cards, however: they can be arranged in a sequence, much like a storyboard or comic strip, and filled in, to tell the story of exactly who would do what and how in the process of a support conversation involving shared data interaction. In this way they lend themselves to model processes rather than object design. Figure ARI4.1 shows an example of three cards having been filled in and arranged in sequence to tell a simple story of how a scenario of a worker seeking out an address following new information from the family member.
In addition to the storyboard cards, I also designed ‘backing mats’ for each of the eight card types. These were printed on large coloured card corresponding to each card type’s backing colour, and provided areas for the ‘decks’ of available cards to be picked from. Each backing mat provided a separate home for family member actions, staff actions, and joint actions. Additionally, each backing mat included a summary of the available action cards of this type, and a prompt question. An example of a backing mat, in this case for Problem Cards, is shown in Figure ARI4.2.
Introduction and Practice
In order to familiarise the participants with the storyboarding action cards and the available actions, participants were first presented with an introduction to the storyboarding concept, as used in film-making and participatory design, then the card design and intended usage was explained.A very simple scenario of a family going through a breakup was used to talk through an illustrated example of how to map out the subsequent worker/parent conversation using the action cards. Then participants were invited to use the same scenario and practice mapping out the scenario themselves; however, this time they were to map out a ‘problematic’ version of the scenario, where things do not go so smoothly.
Scenario-Based Storyboarding Discussions
After the participants were acquainted with the cards and had practiced the storyboarding method, the main activity began, to which was allocated the majority of the time in the session. This involved each group mapping out two stories for a more substantial scenario; one version where things go smoothly and another ‘negative’ version where things do not go smoothly. It was highlighted to participants that the aim was to identify what would or should happen at each stage, and why.
The scenarios used for this activity by the two groups were (a) a new scenario where a couple is looking at their historical medical records (which contain various matters of concern such as missed appointments and historical mental health issues) and (b) a ‘labels and judgement’ scenario that had been used in the phase 1 workshops. Additional scenarios were prepared but not used. The layouts of the completed storyboards were photographed for reference, and to provide context during analysis of the discussion transcripts.
For a completed storyboard layout example, see Figure 3.10.
Quotations included in section 4.3 are references using the following notation:
The number after FQ/CQ/SQ provides a unique identifier for each quote. Individual speakers are identified only by their role. Within each quote, or in brackets afterwards, the speakers are identified as Worker, Parent, Child, or Researcher.
Most quotes and conversation extracts are directly embedded into section 4.3. All other quotes referenced in the text (excluded for reasons of space and flow) are included in ARI4.5.
The majority of quotations and conversation extracts in Case Study Two are embedded inline throughout section 4.3. The following quotes were referenced in the text but excluded for reasons of space and flow. The following list also includes some quotes or extracts which were abridged in the Chapter body but are included in full here.
FQ1 [Researcher(A), Parent(B) & Daughter(C)] A: “So [you think that she should be able to be] selective about the things she wants her worker to know and leave out things that she doesn’t?” B: “Yes, like only her mental health and what tablets she’s on and things.” C, talking to B: “It sounds like you.” […] B: “If she trusted her worker, I think she’d tell her herself though.” A: “Do you think that makes a big difference?” B: “I had a worker and my daughter didn’t like her and it made it really difficult when she came out. But she likes the new one.” C: “I don’t.” B: “Why?” C: “She’s annoying.” A: “So do you think the relationship makes a difference to how much you tell?” C: “Yes. Because if you don’t like them, why should you tell them?”
FQ3 [Researcher (A) & Parent (B)] A: “What do you think could be done? What would help [this family] feel a bit happier?” B: “Give them a one-to-one support worker who they can build up a trust and understanding where you feel like they’re not going to share your information. I don’t know, maybe come up with a computer thing so you [the family] know what they’re [the workers] putting in or maybe sign paperwork [to give your approval].””
FQ6 [Parents] A: “It’s so hard because we’ve all done things in our past […]” B: “I think for him to see [old medical records] the doctor should have requested it, it shouldn’t just be there for him to see. I don’t know, if he was going for some mental health problems or something and then [he can] look back… […] It should be like you have to request to look at that data. I know when I’ve been to the doctors and they actually go into a different part of the system to find my old records, which I think is a bit bad. It shouldn’t just be there.”
FQ9 [Child] “I [designed] a graph to show how you are feeling day by day.”
FQ11A [Parent] [discussing the sentence ‘Numerical scores are a good way to judge a family’s progress’] “No, I disagree, because just anybody can tick any numbers. You could have a good day, you could have a bad day.”
FQ11B [Researcher (A) & Parents (B, C, D)] A: “Do people have a right to know [past incidents with police]?” B: “Not really. The past is the past, isn’t it?” C: “No, because…” D: “You shouldn’t be judged on your past, but I think it should be there [accessible in the data] because I think at the end of the day, you can fall back into old ways. The thing is, if you’re putting a child at risk or a person at risk, I think you [the worker] need to know everything, don’t you?”
FQ12 [Parent] “[The parent could] countersign. [The worker would] say, ‘I feel that we’ve talked about this today so I’m going to write that down. I’m going to show you. Can you sign and me sign if you’re happy and I’m going to share this.’ That’s a bit different [better]”.
FQ15 [Parent] “You would think that it would help with your benefits, [that] you wouldn’t mind sharing your data, would you, because they [support workers] are trying to help you. It’s not like they’re saying, ‘Well she gets too much money,’ They’re not trying to cut [families’] benefits, they’re trying to help [families].”
FQ16 [Parent] “[Families need to] feel they’re being involved. […] [We need to be able to] sit together and say, ‘Right, that’s the information I’ll allow you to share. I don’t want that bit shared. But this bit, because it will help me and the family […]’. Say in this [scenario] family, she might have been married before and had domestic violence so she doesn’t want that bit shared, that’s in the past. So it’s [only] certain up-to-date information about the family [that would be shared] because this [the family suggested by the data] isn’t her family.” [Parent, SQ76]
FQ17 (Worker (A) & Parents (B, C)) A: “‘Families don’t want to be responsible for looking after their data’”? B: “It’s one of those things where …” C: “You’ve enough on without all that.” A: “You just don’t think about it.” C: “And if you were to think about it, would you actually do anything?”
SQ3 [Workers] A: “I think we would have to see all the data.” […] B: “If you’re going out to visit a family, you don’t know what you’re going to.” A: “It’s about protecting ourselves as well.” B: “Yes, we have to check for markers, potential violence, things like that.”
SQ4 [Worker] [imagining an interface that would allow workers to see missed appointments] “Often they can lie to you, can’t they, and say,”Well yes, I’ve been to the doctor. Yes, I’ve been to the dentist. Yes, I’ve done that and yes, I’ve done that. But then [with this] we’ve kind of got the proof.
SQ5 [Worker] “[a benefit of having family’s data is that] families don’t have to tell the tale over and over again […] they don’t want to have to keep verbally telling everybody.”
SQ6 [Worker] I had one [client] yesterday where she was nearly all fives [out of 5] because they’d made that much progress. I had to put that on. So, she saw that as a real positive […] She was like, “I don’t need your support on this, I don’t need your support on this.”
SQ14 [Worker] “Parents might not want certain information [shared] so it might not be on [the visible data records] anyway…”
SQ9 [Workers] A: Sometimes they might have been out and had a drink, had an argument but the police have been called and it’s recorded as domestic abuse. B: That’s what I’m saying about it [the “domestic violence” label] being overused. A: In isolation, it probably wouldn’t be classed as domestic abuse. It was just an argument.
SQ10 [Worker] I think we make a lot of assumptions on information that we get about families without actually talking to them to find out why.
SQ11 [Workers] A: “I think you should never make a judgement on data …, that data could be wrong.” B: “It takes individuality, working with that person as well, doesn’t it?”
SQ12 [Worker] “It all depends on what data they’ve got. You take that family I worked with, if there was nothing on there about the mental health, she just looked like a really, really poor parent when in fact she’s not. I think a lot of the professionals over the years have just thought that. So, I disagree [with that sentence].”
SQ13 [Researcher (A) & Workers (B, C)] A: “Was that fair and appropriate and is that accurate in terms of [what data has been viewed]?” B: “I think it would be fair… I think for me it’s fair if it’s current because…” C: “It can only be fair if it’s complete, [if] you’ve got all the information there.”
SQ15 [Worker] “They [families] don’t like people knowing what’s going on in their lives.”
SQ17 [Worker] “You often get [that] by the time they’ve got back from the doctors, it’s ten times worse than the conversation actually was and three other things were thrown in and then they started spiralling out of control thinking about ‘What has been said behind my back?’ sort of thing.”
SQ18 [Worker] “It hasn’t been explained property to this family that their information will be shared with other professionals. So, they’ve been left feeling really let down and probably quite angry about it. So, although that information does need to be shared, they [the support workers involved] ought to make the family properly aware that information will be shared.”
SQ20 [Worker] “A lot of the families we work with have got the fear that we’re still social workers or attached to social workers. So, they’re saying, ‘I’m not going to share with you or work with you.’ […] [They might] say,”You’re not social services are you? We’re not going to have the kids taken away?”
SQ23 [Workers] A: I think [the medical data we can access] has to be issue-specific. I think to be able to see somebody’s full medical history is not always relevant to why we’re working with them. B: I had a gran who had residency and the GP sent everything from when she was 15 [including the details of her lost pregnancies]. That wasn’t relevant to what they were doing at the time with the grandchildren and residency. It’s got to be relevant. […] A: Relevant to what you’re doing with the family. B: Yes, relevant with the priorities and the issues what’s affecting them.
SQ24 [Workers] A: “Yes.” [to the sentence ‘Families’ data should be private unless they say it can be shared.’] B: “Unless it’s safeguarding, obviously.” A: […] “It’s private, but I guess if there was a real significant need for us to know or somebody else to know that information for safeguarding…” B: “The law will overrule.”
SQ25 [Workers] A: “Imagine somebody doing that [checking all the different data sources] though, that would be a lot of work, wouldn’t it?” […] B: “But actually, that’s a really good idea to have it all in one place.”
SQ26 [Workers] A: “[In this imagined ideal system] you would press on ‘Mum’ and then get all the data.” B: “You’d get all the data, anything you want.” A: “Crime, financial, just the things that we get. Then everything for Dad.”
SQ30 [Worker] “I think for some parents it will be good for them to visually see it as well. […] So you’re able to give them almost a visual context rather than just talking at them. Different people take information in different ways, don’t they?”
SQ31 [Worker] “I guess the things with [tables of data] is that might just be like a number or a percentage… whereas [using a pie chart or graph] is actually giving some context.”
SQ32 [Workers] A: “A lot of the time they say, ‘I’m not going to get into any more trouble,’ [but with the ability to show them data] you can say,”But if you did, this could happen.” B: “If you get into more bother, you’re going to go straight back down to there [acts pointing at data]. Look where you are now. If you carry on you’re going to end up up there but if you go back, if you continue to smoke that weed and smash that phone box, you’re going to go straight back down to there.”
SQ34 [Workers] A: “[Our idea is] an app for checking that data, with graphs and charts.” B: “That would be amazing if we just sat down with them and handed them [a tablet] and said,”We’ve just updated [our records]. Can I just check the accuracy?”
SQ35 [Researcher(A) & Worker (B)] A: What do you think determines whether [families] do or don’t have an interest in [checking their data]? B: I think the experiences that they’ve had […] If it’s historical to say a safeguarding, [they’ll just think] ‘we know what the process is, we know how things are kept, we’re not going to be able to do anything about it.’ [Worker & Researcher, SQ35]
SQ38 [Worker] “Families don’t know [what] data was being collected anyway […] If they knew what data was being collected about them and why it was being collected about them, I think they would mind – but I think that regardless of the fact whether they can see it or not, a lot of families don’t know how to access it because it all comes in the small print.”
SQ39 [Worker] “Not many families ask to see the case notes, whether it’s a social worker or whether it’s a family partner, other members of the authority or any other services. So […] even if they’ve seen the data, [I’m not sure] whether they’d be confident with everything that’s been on it.”
SQ40 [Worker] “Some families will go, ‘Well you know that information because it’s all there somewhere.’ We’re like, ‘Yes, but we don’t want to trawl back to eight years ago.’ There’s reams and reams and reams of it [data].”
SQ41 [Worker] “The information that we hold […] you would verbalise this as well when you go to visit the family. But what we [imagine] is expanding that a little bit more so: explaining why we hold the information that we hold, the process of why we store data, the information that we’ve got.”
SQ42 [Worker] “A lot of […] families talk to us about data we’ve collected and not one family I’ve ever met has got an issue with that. We go to them and say, ‘We’re aware that you’ve got these issues going on,’ and it might be antisocial behaviour or school attendance, health or a domestic violence incident and they’ve never said,”How on earth have you got that information?”
SQ44 [Worker] “For me, there’s so much data that’s stored. For me, for a parent to understand that through a text or email but just in point form. […] The less written, the better for the parent. [What we need is] a small synopsis […] like a summary view.”
SQ45 [Workers] A: “You know when people do have difficulties in terms of reading, on the computer you [could] press the sound button and it can read it for you. […] like text to audio.” B: “[It needs to be in an] easily understandable format, taking into account the family’s needs.”
SQ46 [Workers] A: “[using a data interface to convey data to families] is quite verbal, isn’t it?” B: “It is. The way you use your words, the way you use your language […] [the] husband’s needs are completely different to what [the] wife’s are. Her levels are really low and your levels are really high. I think that’s about the way you use your words…” A: “It’s how you explain it.”
SQ47 [Workers] A: “In terms of children, [you would need to have] more pictures and it would [need to] be clearer. [… Let’s write down] ‘Using age appropriate information’.” B: “Yes […] so it [would be] tailored content for the individual, if the age is there it might be sensitive information.”
SQ48 [Workers] A: “[There should be] separate data for each member.” B: “So really, if you want to talk to the daughter, she’s not going to see the mum or dad’s data. If you’re talking to the dad, he’s not going to see…” A: “Unless they get permission. So you [could] have a tick box system at the start about who can see what…”
SQ51 [Worker] “[The families would have] a little app which they can log in to and read all their information - what’s recorded about themselves, they can read the consent policy, who we share the information with, who we have shared the information with. If they’re not happy — this would be a read-only app for them — if they’re not happy they can fire off an email to us and let us know what they disagree with or if they want their information taken down or their consent.”
SQ52 [Worker] “You’d just have a different page for each one of the priorities what we work with and all the information stored under there. So our key feature would be you’d be able to have individual family members log in. That would be to prevent the child seeing what mum and dad’s issues were and stuff like that if it wasn’t relevant. You’d be able to select what information is visible to other family members.”
SQ55 [Workers] A: “[It’d be good to have a way to] capture young person’s voice and conversations.” […] B: “Self-help buttons [would be good] so say if somebody is feeling depressed […] There is a lot of self-harm going on at the moment.”
SQ56 [Worker] “[our app design] would allow [families] to record audios and then the workers can then access those transcriptions. […] There’s no chat, it’s just about getting their worries, if they can’t sit and talk to you in a face to face, one on one conversation…”
SQ57 [Worker (A) & Researcher (B)] A: “There’s times when I’ve been totally stuck in terms of getting information from professionals, GP, CAMHS9, so I’ll say to the family,”I need this information, can you ring and get it?” B: “So the family point you in the right direction, so they fill in the gaps for you?” A: “Yes.”
SQ58 [Workers] A: “There’s loads of things where [families] make massive improvements, it’s just not recorded. [They might have] changed their diet or lifestyle. There are loads and loads of things…” B: “But it’s not recorded as data.”
SQ62 [Workers] A: ‘I would be inclined to agree because they can’t get away from it.’ B: ‘I think it depends on how you would pass it back, really.’ A: ‘Well, it would be useful in meetings to know that she’d suffered from domestic abuse.’ C: ‘Yes, I can see the benefits and the downsides, yes.’ B: ‘Yes, so, they can shake it off but it also gets in the way.’
SQ63 [Workers] A: “[reading sentence] ‘Asking families for consent to share data just once at the start is enough.’ This is what we do now but how many times, when things go wrong families say to you, ‘I didn’t consent to that, I didn’t. That’s not what you asked me at the beginning.’” […] I don’t know if there should be a regular…” B: “…like an update, because things change in their life.” […] A: “[Should] we then [have] reviews, every six weeks [or so …], say to them, ‘Well let’s just remind each other what share consent is for and about.’? […] Obviously it’s got to be regularly done because […] circumstances change.”
SQ64 [Workers] A: “[You would] click on the feed [an imagined feed of updates concerning the family] and it would bring up if they’ve been in trouble.” B: “Absolutely. This [would] definitely [be] your perspective of families.”
SQ65 [Workers] A: “We would get a report through to say…” B: “They’ve recorded something.” A: “Yes. Then I suppose we would follow it up […] face to face.”
SQ67 [Researcher (A) & Worker (B)] A: “So is the key point of this one, that the families have input, as well and agree on what is put on there?” B: “Yes, so, agree on it and then they can add their signature.”
SQ72 [Worker] “You will have parents who will say that they don’t want to share because they know the consequences. One of our families, the little one, she’s six, and there was a DV [= Domestic Violence] incident and her mum was like, ‘Don’t say anything at school.’”
SQ75 [Worker] “[This imagined data interface] would be accessible to both worker and family member so that we can be in sync but [would be] encouraging the family to take full accountability for their own responsibilities.”
SQ76 [Worker] “Let’s say dad was sexually abused when he was a child, I think that’s important that we know that because dad could have mental health problems now which would be a result and we didn’t know that and he didn’t want to speak about it.”
SQ77 [Researcher (A) & Workers (B,C)] A: “Was that fair and appropriate and is that accurate in terms of [what data has been viewed]?” B: “I think it would be fair… I think for me it’s fair if it’s current because…” C: “It can only be fair if it’s complete, you’ve got all the information there.”
SQ78 [Worker] “So maybe you’ve got groups of young people who are, I don’t know, there’s something going on maybe in [local park], you’ve got some antisocial behaviour and they might be putting on their things that they like to do it with their friends. Then we pull from that, actually you’ve got a group of these young people who are involved in this. Then from that you can have focus groups. So, I think [if] we all as family partners know that we’ve got groups of young people where they are hanging out together so instead of just being one worker, I might think,”Well actually, there’s so many people in my team have got these kids so we can have a focus group.”
CQ1 [Worker (A) & Parents(B, C)] A: “I think most families wouldn’t think about [checking their data] until […] something happens and they go, ‘Hang on a minute, that’s not right.’” B: “Yes, ‘Where’ve you got that from?’” C: “Yes, yes.” A: “But I think, other than that, we tend to just trust that everything that has been put down is right, don’t we?” C: “Yes.”
CQ2 [Worker] “That happens a lot, doesn’t it? It does happen where information is shared and then somebody gets upset because they didn’t think that level of information would be made available, even though permission had been given at the start of the plan.”
CQ8 [Parents (A, D), Worker (B)& Researcher(C)] A: if you find [a criminal record for burglary], you’re looking and thinking, “God! She’s gone out and committed a bloody burglary.” B: Well, it could affect your employment chances if that comes back on your DBS. But I explored it and talk about it and she said, “Well, I don’t agree with that. That’s not what happened.” I mean, she did break in but she wasn’t stealing anyone else’s stuff, it was her own stuff. […] If there is breaking and entering and burglary, and no explanation of that, and no way for that person to give you an explanation … C: It’s just somebody’s version of what happened? B: Well, it is, isn’t it? D: Well, the Courts need to change what’s recorded because if you broke into a house and stole a telly, that would come to the top. Whereas, something like that, which is more or less trespassing. In the eyes of any decent solicitor, it’s trespassing, to get your own stuff but, technically, you’ve stolen your own stuff. That should be put on a scale of severity, of 1 to 5, in the circumstances. If you’re homeless and you break into an empty house, is that burglary? Is that worth three years in prison? You know what I mean? [Parents, Worker & Researcher, CQ8]
CQ11 [Parent (A) & Researcher (B)] A: “I would want to see what information is held about me but then there are people out there who aren’t very confident in being able to ask or if they can’t read, if they’ve got learning [difficulties]” B: “What should happen for those people then?” A: “They should be supported by whoever is around them to access it in some form or another.” B: “They need to have someone talk them through it, or something?” A: “Yes.”
CQ12 [Parent] “I think a lot of people would like to be able to [access their data]. I think the prospect of, if you want to see your medical records […] having to make an appointment and go up and sit down and read paper records [is not something people would choose, whereas] if they were able to access it, in their own time, at their own pace [that would work better]. I’d love to see what’s been written about me in my medical records, I think some of it could be quite interesting.”
CQ15 [Parent] “I think [whether support workers should be able to access mental health details] depends on how long ago it was. […] I went through a really, really rough patch […] nearly 20 years ago and I had a brief patch of about three weeks where I was really not controlling my depression and I self-harmed and made an absolute fool of myself, and I’m fine with that now but I wouldn’t want people, everybody, to know about that because I wouldn’t want people to jump to the conclusion — because they still do — that there’s something wrong and I’m going to do it again and things like that. Because people change, and situations change.”
CQ17 [Worker] “I think most families wouldn’t think about [looking at or checking their data] until […] something happens and they go, ‘Hang on a minute, that’s not right.’”
In this section, the methodology used for the analysis of data from Case Study Two is explained. The content of this appendix is identical to Appendix 3 in the Supplemental Materials of the CHI 2022 paper from this study (Bowyer, Holt, et al., 2022). Case Study Two was written first as a paper and then expanded to produce Chapter 5. While the paper was co-written, Chapter 5 was written entirely by Alex Bowyer.
All coding was carried out by Alex Bowyer and Jack Holt, who followed the following process over a nine-month period, comprising at least 200 person-hours:
Some additional detail on the stages:
1. Semi-Quantitative Data Extraction & Analysis
Prior to beginning coding the data, responses to some key closed questions from the transcripts were combined with field notes, response emails from companies forwarded by participants, sketches and tables from Interview 1/2, data from the interview 2/3 spreadsheet cells, and other data collected, and used to populate a spreadsheet that featured summaries of those responses. For example, where participants had been asked to outline their hopes for the outcomes of their GDPR data requests, these responses were recorded on the spreadsheet to be used as a resource for summarising participant hopes in a manner that could be easily quantified and referred back to. In some cases this data was analysed within the spreadsheet to produce insights, graphs and percentages. Such data was later used to support and illustrate findings from the coding process. This spreadsheet also included important information relating to each participant’s GDPR process experience, such as the timeliness and completeness of their data returns, which could serve as a reference point when analysing the transcripts.
The semi-quantitative data areas captured or derived from captured data were:
2. Text File Processing (Splitting & Recombination)
The researchers then moved on to prepare for the fully qualitative analysis. All interview audio was auto-transcribed using Zoom and Google Recorder, and then the generated text files were cleaned. Cleaning consisted of listening to sections of audio where transcription seemed inaccurate and correcting the transcripts. Due to the volume of data this cleaning was not done for all texts, only where ambiguity or typos meant it was needed for accurate coding and for quotes. Some anonymisation of source texts was also carried out at this stage and later, with a particular focus on quotes included in the chapter. The researchers used this data preparation stage as an initial means of (re)familiarising with the dataset. With reference to the structured interview schedules, the initial 33 text transcripts were split up by participant, company and topic using the labelling scheme outlined in ‘Text File Labelling Strategy’ below.
At the end of this process, roughly 100 ‘pieces’ had been identified for each participant (slightly more for P11 whose interview 1 covered a broader scope and considerably less for P9 who only did interview 1).
3. Categorisation into CSVs
The pieces from stage 1 were then recombined, across all participants, into 233 source files. These 233 source files were then further grouped into 6 topics areas. (The aim of the analysis was to identify common opinions and ideas around different topics, not to explore individual participant journeys end-to-end). The six topic areas were:
This produced too many files for import into Quirkos Cloud, so once organised by topic, these six groups of files were further combined into 11 General files and 46 Company-Specific, files (with Life and General going into the General files and everything else going into Company-Specific). This gave 57 organised CSV files ready for use in the first coding phase.
4. Inductive Coding
The majority of the analysis took place with the use of Quirkos Cloud (Daniel Turner, 2014), a computer-assisted qualitative data analysis software (CAQDAS) package that allows for collaborative analysis by more than one researcher. The 57 files from stage 3 were imported into Quirkos Cloud, with each having a unique number. The sources in Quirkos were labelled by Participant, Company and Topic for easy search and retrieval. The researchers then collaboratively coded sections of the interview transcripts to develop and ensure a consistent approach, based on established techniques (Huberman and Miles, 2002; Braun and Clarke, 2006). Codes were identified inductively and not according to a fixed or predetermined set. Once a baseline codeset and strategy had been established, they each coded sections of interviews in parallel, regularly regrouping to discuss generated codes and any new questions or challenges arising. At first, these codes were created in an unstructured/flat state with only occasional clustering on the Quirkos interface. Due to the volume of data, not every piece of every transcript was coded, however care was taken to ensure a representative sample of views from across the participant pool was included. These were clustered into loose code-topic areas, an example is shown in the following screenshot taken approximately 6 weeks into coding:
5. Reductive Cycles
As more codes were identified and structures and commonalities between them were formed, existing codes were merged or absorbed into one another and grouped together in small clusters. The researchers regularly met to discuss each other’s codes according to their context and occasionally amended wording or merged concepts that were labelled differently but semantically equivalent. All codes were checked and agreed between these two researchers. Over time, the codes were iteratively structured and restructured, creating top-level thematic clusters around different research questions that held multiple layers of related codes. These clusters were then summarised with a short sentence or paragraph of text, allowing summaries to be produced at different levels of hierarchy. These summaries were kept in the Description fields of codes in Quirkos and also in external structured text-based documents. These can be seen in the following screenshot, taken 5 months into coding:
The above-pictured structure of the coded corpus at the end of the Quirkos Cloud phase was as follows:
Total codes = 645.
6. Theme Identification & Quote Extraction
Having produced the structure above as a reduced representation of ‘what the codes say’ that the participants think, the researchers used outlining tool Workflowy (Turitzin and Patel, 2010) to develop the arguments and primary narrative of the chapter into a structured three-theme-based summary of the most important items from these findings. The code hierarchy was used as source material to populate the three key themes with illustrative quotes and observed findings. An example from later in this process (around 8-9 months since Stage 1 began) is shown in the screenshot below:
The themes are broken down in detail in 5.4 and can be summarised as:
In all, the process from commencing data analysis to writing up thematic findings in the chapter took over 200 person-hours over a 9-month period from January to September 2020.
Text File Labelling Strategy used in Stage 2
In stage 2, text files were initially broken down into small pieces and labelled according to the following strategy:
Interview 1 (Sensitisation / Poster Display Chat)
Break into 5 parts:
Comp - list of companiesType - types of dataDoWt - potential uses of data [‘what would you do with
the data?’]GDPR - GDPRMotv - motivation for taking partInterview 1 (Main Sketch Interview)
Break down as follows:
SktR - review of previous sketch interview from prior
study [p11 only]DPer - definition of personal dataDAcc - definition of access to dataDCon - definition of control of dataDPow - definition of powerSket - sketchingAnno - annotationSelC - company selectionXXXX - per company [use first four letters of
company]Powr - powerHope – hopesUses – usesWrap - [Wrap up]/What happens nextFormat:
NN-pXX-iX-[Comp/Type/Uses/GDPR/Motv]-[company first three letters].txt
e.g. 01-p01-i1-Comp.txt or
02-p01-i1-Powr-Face.txt
Interview 2
Break down as follows:
XXXX - per company [use first four letters of company
name]
Priv - viewing privacy policyPowr - powerHopU - hopes & usesTrst - trust [p10 & p11]Pow2 - end powerTrs2 - end trustHop2 - end hopes and usesFormat:
NN-pXX-iX-[….]-[company first three letters].txt
e.g. 01-p01-i2-Priv-Goog.txt
Interview 3
Break down as follows:
XXXX - per company [use first four letters of company
name]
Powr - power ratingTrst - trust ratingRPow - retro powerRTrs - retro trustHope - hope (for company) and uses (how well have hopes
been met / how practical are the envisaged data usesData - Overall data overviewProv - Data provided by youIndr - Data indirectly / automatically collectedDerv - Data derived about youOthr - Data from other sourcesMeta – MetadataGenQ - general questions about this companyPow2 - end powerTrs2 - end trustNext - what next for this company specificallyGenr - General topicsHope - Hope (general)Wrap - Wrap up questions / the futureFormat:
NN-pXX-iX-[….]-[company first three letters].txt
e.g. 01-p01-i3-Cred-Indr.txt or
02-p01-i3-Genr-Wrap.txt
The quality and coverage datapoints described in 5.3.3 also allowed insights about which service providers were strongest or weakest in each category, and overall, to be drawn. This was done by tallying the ‘Yes’ responses for each category and overall, then dividing by the number of times that provider was selected, to avoid inflating scores for popular companies. The outcome of this analysis is shown in Table ARI5.1. The companies that fared worst overall were those that did not return any data at all in response to a GDPR request (Sainsbury’s, Freeprints, Tyne Tunnels, LinkedIn, Huawei, Bumble, LNER). It should be noted that Sainsbury’s and Huawei did respond, claiming to hold no data for the requesting participant, though participants found this implausible, which indicates either a problem with compliance, explanation or trust. The other named companies here did not respond at all, despite at least two follow-up emails being sent to them, and despite in some cases having initially acknowledged and promised to satisfy the request.
Companies producing responses with good coverage and good quality included Niantic, Nectar and Sunderland AFC as well as to a lesser extent Natural Cycles, Revolut, Spotify, Tesco and Amazon. Facebook and Google fared well for the breadth of data returned (due in part to their download dashboards), though the quality of Google’s data was found lacking across multiple categories. Last.fm (owned by CBS) fared poorly overall due to poor category coverage, despite the limited data that it did return being of high quality.
Table: Table ARI5.1 - Best and Worst Data Holders for GDPR, according to Participants’ Judgementsa
I took a three-month sabbatical from my PhD in the summer of 2020. I was remotely embedded within a full-time research internship at BBC R&D - the British Broadcasting Corporation (BBC)’s Research and Development (R&D) department (British Broadcasting Corporation, 1997), working with specialists, designers, researchers and developers on an exploratory research project codenamed Cornmarket. I continued this involvement as a part-time research consultant and critical friend for a further 5 months after the conclusion of the initial three-month placement.
As part of its Royal Charter, one of the BBC’s lesser known obligations is to maintain a centre of excellence for research and development in broadcasting and electronic media, and to this end it employs over 200 researchers in its R&D department looking at everything from AV engineering and production tools to new forms of media, virtual reality, digital wellbeing and human data interaction (British Broadcasting Corporation, 1997). The Cornmarket project, launched in 2019, is a BBC-internal human-data interaction research project which explores a possible role for the BBC as it moves beyond broadcast television, using its public service responsibility to guide citizens to a position of empowerment within today’s digital landscape - encompassing not just entertainment but health, finance and self-identity. Due to its unique funding from UK-wide TV licensing and its duties to not only entertain but to inform and educate the general public, the BBC is uniquely placed to take a more human-centred approach than commercial innovators in this space as it needs only to deliver value, not profit. The project is exploring the use of Solid (Berners-Lee, 2022) technology to build a working Personal Data Store (PDS) prototype [2.3.4] while also developing, iterating and trialling user interface designs and conducting participatory research interviews and activities all to explore what for a BBC PDS might take and what features its potential users might value.
The proposed BBC Cornmarket product, internally called My PDS, would allow people to populate a PDS with personal data from APIs and data downloads from a variety of services including BBC iPlayer, Netflix, All4, Spotify, Instagram, Strava, Apple Health, banks and finance companies, as well as social media companies such as Facebook, LinkedIn and Twitter, and then to use these combined data sources to create personal profiles for Health, Finance, Media (i.e. entertainment) and Core, within which various data insights, visualisations, capabilities would be delivered. One feature the work explores in depth as potentially valuable to users is the ability to include and exclude certain datapoints from the imported viewing history data in order to present a more accurate, curated view of oneself that could then be fed back to other applications such as BBC Sounds to give better content recommendations.
With a cross-disciplinary team of around 20 people including architects, developers, user experience designers, product designers, innovators, participatory researchers and marketers, and funding to outsource public engagement research to agencies, this project represents a significant player in the emerging personal data economy [2.3.4]. As such the Cornmarket project is a fertile ground in which to learn more from practitioners in the PDE space and to test the learnings of this thesis in practice while also finding deeper insights in response to my research questions - in particular RQ3 which is concerned with the building of more human-centric personal data interfaces in practice.
Much of the work I did during this extended internship can be seen in the designs within 9.3, as well as the research report I wrote (Bowyer, 2020a) and internship writeup (Bowyer, 2020b). My work with the Cornmarket project can be seen as the concluding part of one of several action research cycles within my PhD [3.2.2].
An additional Figure from my time on Cornmarket that was not featured in the main body of the thesis is shown in Figure 7.1 below. This shows a screenshot from a functional prototype tool I produced during a hack week that allows the user to upload data retreived via GDPR or download portal, and proved the concept of programmatically identifying key entities 9.3.3 and identifying time-labelled events for display as life information to users.
A number of articles relating to the Cornmarket project have been published:
Following the conclusion of the funded period of my PhD, I took up a near-full-time position as Project Leader and Personal Data Coach at Hestia.ai (Dehaye, 2019)), a startup based in Geneva, Switzerland. Hestia.ai is a company conducting research, developing technologies, and delivering training, in the emergent MyData/PDE space [2.3.4]. In essence, the company’s mission is to help individuals and especially collectives to more easily obtain and understand data held about them, and to help them visualise, aggregate and make use of that data. It is an example of a data access and understanding services company as described in 9.5.3.
I was specifically hired to co-lead the digipower project (Härkönen and Vänskä, 2021), for Hestia.ai’s client, Sitra (Sitra, 1967). Sitra is a non-profit organisation in Finland, funded by the Finnish Parliament and accountable to the Finnish people. The goal of the digipower project was to guide 15 European politicians, civil servants and journalists, through the process of obtaining and exploring their own data. The participants were high-profile VIPs, including the former Prime Minister of Finland and former European Commission Vice President, Jyrki Katainen. The goal was to empower those individuals to better understand the workings of the data economy, so that they might be able to influence others and effect change. One of Sitra’s goals is to establish a fairer data economy (Sitra, 2018). Methodologically, the project drew heavily on my own Case Study Two [Chapter 5], adopting a similar method of guiding individuals through the process of making GDPR requests and scrutinising the returned data; I was employed on the project for this expertise. Where it differs from my own Case Study is that the focus of the research was outward, on the data economy and the practices of service providers, rather than inward, on the lived experience of the participants. Other differences included the building and use of software interfaces to provide participants with data visualisations, the use of TrackerControl software to audit mobile phone apps [Insight 12], and the direct analysis of participants’ retrieved personal data by the Hestia.ai research team (whereas my Case Study explicitly avoided handling participants’ personal data). The project resulted in three reports:
At the time of publication of this thesis (August 2022), I continue to be employed by Hestia.ai, working on the research, design and development of tools to help collectives [Insight 10] with data, make data easier to understand [6.1.2; 7.7], and exploring methods to help people ‘hack the seams’ of digital platforms and services [9.4].
Where the BBC internship has helped me to understand the practicalities of connecting people with their personal data in pursuit of Life Information Utilisation [7.6], my work with Hestia.ai has helped me understand the practicalities of how people might acquire greater Personal Data Ecosystem Control [7.6]. In this sense, both peripheral activities have been highly complementary to developing an overview of the pursuit of HDR in practice.
As a software developer I have been aware for a long time that one of the biggest challenges in building new data interfaces is to gain programmatic access to the necessary data. As part of the trend towards cloud-based services and data-centric business practices, it has become increasingly difficult to access all of the data held about users by service providers. Application Programming Interfaces (APIs) are a technical means for programmers to access a user’s data so that third-party applications may be built using that data. Unfortunately, as a result of commercial incentives to lock users in and keep data trapped (Abiteboul, André and Kaplan, 2015; Bowyer, 2018), much of users’ data can no longer be accessed via APIs [8.4]. While GDPR data portability requests do open up a new option for the use of one’s provider-collected data in third-party applications, this is an awkward and time-consuming route for both users and developers. Web augmentation provides a third possible technical avenue for obtaining data from online service providers. It relies on the fact that a user’s data is loaded to the user’s local machine and displayed within their web browser every time a website is used, and therefore it is possible to extract that data from the browser using a browser extension; this as another seam that can be hacked—see 9.4 and Insight 12. Similarly, once loaded into the browser, a provider’s webpage can be modified to display additional data or useful human-centric functionality that the provider failed to provide.
In order to better understand what is and is not possible using this technique, I participated part-time from 2018 to 2020 as the sole software engineer in a DERC (Digital Economy Research Centre) project. This project was using the web augmentation technique to explore how researchers could improve the information given to users of Just Eat, a takeaway food ordering platform in the UK. Hygiene Rating information for each outlet was added, as well as a feature to enable user to sort by hygiene rating, as shown in Figure ARI7.1. The theoretical basis for this research was published in (Goffe et al., 2021, 2022). While this particular use case does not concern personal data, the technology and techniques being used by the project to exploit the browser seam were considered highly relevant to the exploration of HDR-improving possibilities, and the goals of the research project were also human-centric, and consistent with this thesis’s research goals - tackling the hegemony of service providers in order to better serve individual needs.
This is a note about the attribution of insights within Chapter 7, as the ideas originate quite differently than from the rest of the thesis.
This thesis is my own work. All ideas synthesised in Chapter 7 are original. Some of the specific details, theories and ideas presented in Chapter 7 arose or were developed or augmented through my close collaboration, discussion and ideation with other researchers both alongside and prior to the PhD timeframe, including:
Due to these collaborations and the ongoing and parallel nature of many of these projects to my PhD research, it is impossible to precisely delineate the origin of each idea or insight. In practice, ideas from my developing thesis and own thinking informed the projects’ trajectories and thinking, and vice-versa. These ideas would not have emerged in this form without my participation, so they are not the sole intellectual property of others, but equally I would not have reached the same conclusions alone, so the ideas are not solely my own either. All diagrams and illustrations were produced by me, except where specified, and the overall synthesis and framing presented in this chapter is my own original work. Where this chapter includes material from the four peripheral projects [7.2], that material is either already public, or permission has been obtained from the corresponding individuals or project teams.
This table is referenced and contextualised in section 7.4.
| Way of thinking about data | Explanation & Implications |
|---|---|
| Data as property | Data can be considered as a possession. This highlights issues of ownership, responsibility, liability and theft. |
| Data as a source of information about you | Knowing that data contains encoded assertions about you and can be used to derive further conjectures enables thinking about how it might be exploited by others, but also how you can explore and use it yourself for reflection, asking questions, self-improvement and planning. It invites consideration of the right to access, data protection, and issues around accuracy, fairness and misinterpretation / misuse. |
| Data as part of oneself | A photo or recording of you, or a typed note or search that popped into your head could be deeply personal. This lens on data highlights issues around emotional attachment/impact, privacy, and ethics. |
| Data as memory | Data can be considered as an augmentation to one’s memory, a digital record of your life. This lens facilitates design thinking around search and recall, browsing, summarising, cognitive offloading, significance/relevance, and the personal value of data. |
| Data as creative work | Some of the data we produce (e.g. writing, videos, images) can be considered as an artistic creation. This lens enables thinking about attribution, derivation, copying, legacy and cultural value to others. |
| Data as new information about the world | Data created by others can inform us about previously unknown occurrences in our immediate digital life or the wider world. This lens is useful for thinking about discovery, recommendations, bias, censorship, filter bubbles, and who controls the information sources we use, as well as who will see and interpret data that we generate and what effects our data has on others. |
| Data as currency | Many data-centric services require data to be sacrificed in exchange for access to functionality, and some businesses now explicitly enable you to sell your own data. This lens highlights that data can be thought of as a tradable asset, and invites consideration of issues of data’s worth, individual privacy, exploitation and loss of control. |
| Data as a medium for thinking, communicating and expression | Some people collect and organise data into curated collections, or use it to convey facts and ideas, to persuade or to evoke an emotional impact. This lens is useful to consider data uses such as lists, annotation, curation, editing, remixing, visualisation and producing different views of data for different audiences. |
Some leisure categories (namely Shopping and Transport) were included that are not strictly civic data, as these would be useful for exploring issues around ethics. These also provided a reference point for participants to better consider the ‘big data’ benefits of data linking.↩︎